Using Color

PH345: Winter 2025

Phil Boonstra

Case Study: FARS 2022

Fatal Accident Reporting System (FARS)

  • Data from National Highway Traffic Safety Administration (NHTSA) on fatal traffic accidents in the US

  • Collected and reported annually from 1975. Most recent year available is 2022

For each state, I calculated the following:

  • Number of crashes per 1000 population (all_crash_per_capita)
  • Number of crashes on nice days per 1000 population (nice_crash_per_capita)
  • Number of crashes on “not nice” days per 1000 population (not_nice_crash_per_capita)
  • Number of crashes on other days per 1000 population (other_crash_per_capita)
  • Ratio of crashes on nice days vs crashes on not nice days (ratio_nice_not_nice)

Nice day: Either Clear or Cloudy weather and Daylight light conditions

Not nice day = Either Blowing Sand, Soil, Dirt, Blowing Snow, Fog, Smog, Smoke, Freezing Rain or Drizzle, Rain, Severe Crosswinds, Sleet or Hail, or Snow or Dark - Not Lighted, Dawn, or Dusk light conditions

 mutate(
    nice_conditions = 
      WEATHERNAME %in% c("Clear","Cloudy") & LGT_CONDNAME == "Daylight",
    not_nice_conditions = 
      WEATHERNAME %in% c("Blowing Sand, Soil, Dirt",
                                             "Blowing Snow",
                                             "Fog, Smog, Smoke",
                                             "Freezing Rain or Drizzle",
                                             "Rain",
                                             "Severe Crosswinds",
                                             "Sleet or Hail",
                                             "Snow") | 
      LGT_CONDNAME %in% c("Dark - Not Lighted", 
                          "Dawn", 
                          "Dusk")
  ) 

  • Data are available on Canvas > Datasets > FARS: fars2022_summary.csv and fars2022_regional_summary.csv

Three big problems with this plot:

  1. Too many colors to be helpful
  2. Color scheme encourages comparisons between alphabetically adjacent states
  3. Lots of plot space taken up by the legend

What previous plot looks like to someone with 80% deuteranopia (can’t see green well)

https://bioapps.byu.edu/colorblind_image_tester

Grouping by region instead of individual states

Labeling outlying states explicitly using ggrepel R package (Slowikowski, 2024)

  1. Clearly something wrong with the data for Virgina regarding daylight accidents (dropped from plot below). Kansas and Vermont also worth closer look
  2. More fatal crashes in the South; fewer in the Northeast

find_hull <- function(fars2022_summary) {
  fars2022_summary[
    chull(fars2022_summary$nice_crash_per_capita,
          fars2022_summary$not_nice_crash_per_capita), 
  ]
}

# Feed this into geom_polygon
hulls <- 
  fars2022_summary %>% 
  filter(STATENAME != "Virginia") %>%
  split(.$region) %>%
  map_df(.f = find_hull)

Defining Color

Color can be defined by its hue (the defining attribute, e.g. blue or red), lightness (the brightness), and chroma (the richness of a color),

  • Three hues (red, green, blue)
  • Three lightnesses (top [not bright], middle, bottom [bright])
  • Ten chromas (left [not intense] to right [intense])

Cynthia Brewer (1960-)

American cartographer and professor of geography at Penn State University

Pioneering work in developing color schemes for maps (https://colorbrewer2.org)

Recipient of the Carl Mannerfelt Gold Medal in 2023

Color scales

Instead of choosing individual colors, typically use predefined ‘palette’ of colors. Three types of palettes:

  • Sequential: colors follow a gradient from low to high
  • Qualitative: hue-based palettes for categorical data
  • Diverging: two sequential palettes “pasted together”

Many palettes available in R, including ggplot2

https://colorbrewer2.org/

Default color scheme in ggplot2

library(datasauRus)
dino_plot <-
  ggplot(datasaurus_dozen) +
  geom_point(aes(x = x, y = y, color = dataset), size = 1) + 
  facet_wrap(vars(dataset), ncol = 5) +
  labs(x = NULL, y = NULL) + 
  guides(color = FALSE) +
  theme(text = element_text(size = 18)) 
dino_plot

Set3 palette (qualitative)

library(datasauRus)
dino_plot +
scale_color_brewer(palette = "Set3") 

Dark2 palette (qualitative)

library(datasauRus)
dino_plot +
scale_color_brewer(palette = "Dark2") 

Spectral palette (diverging)

library(datasauRus)
dino_plot +
scale_color_brewer(palette = "Spectral") 

BrBG palette (qualitative)

library(datasauRus)
dino_plot +
scale_color_brewer(palette = "BrBG") 

Misleading comparisons

Perception of color can vary. (a,b) The same color can look different (a), and different colors can appear to be nearly the same by changing the background color (b)1. (c) The rectangles in the heat map indicated by the asterisks (*) are the same color but appear to be different.

Figure 1 from Wong (2010a)

Common pitfalls / Recommendations

  • Ignoring color blindness

    • Use color-blind friendly color palettes when possible
  • Too Much Information

    • Use containment or other aesthetics to assist interpretation
    • Avoid using more than 6-8 colors in a plot (Wong, 2011)
  • Misleading comparisons

    • Viewers have difficulty mapping color changes to quantitative variables
  • Color scales

    • Consider how colors relate to each other, background

Code Together Task

No Spice: Use fars2022_summary to create the plot on slide 6.

Weak Sauce: Use fars2022_summary to create the plot on slide 8. You will need to use the scale_color_brewer() option in ggplot2

Medium Spice: Use fars2022_summary and fars2022_regional_summary to create the plot on slide 11

Yoga Flame: Use fars2022_summary to create the plot on slide 9. You will need to install and load the ggrepel package and use geom_text_repel() or geom_label_repel()

Dim Mak: Use fars2022_summary to create the plot on slide 12. Hint: On slide 12 I give you code to create a dataset containing the polygons (the ‘convex hulls’). You will need to use geom_polygon()

References

Slowikowski K, 2024. ggrepel: Automatically Position Non-Overlapping Text Labels with ‘ggplot2’. R package version 0.9.5, https://CRAN.R-project.org/package=ggrepel.

Wilke, C.O., 2019. Fundamentals of data visualization: a primer on making informative and compelling figures. O’Reilly Media.

Wong, B., 2010. Color coding. Nature Methods, 7(8), pp.573.

Wong, B., 2011. Color blindness. Nature Methods, 8(6), pp.441.